Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Error Handling in Member Assignment to Ensure Seamless Processing #427

Merged
merged 2 commits into from
Aug 20, 2024

Conversation

sciclon2
Copy link
Contributor

@sciclon2 sciclon2 commented Mar 7, 2024

Summary

This pull request fixes an issue with error handling in member assignment, ensuring that data processing remains uninterrupted and error-free.

The code would fail and skip the pending metrics. For example, if the Confluent Schema Registry created a group named "schema-registry" but used the protocolType "sr", it would cause a silent failure, breaking the loop of groups to scrape the metrics.

@danielqsj
Copy link
Owner

According to https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-GroupMembershipAPI, the protocol type consume is not mandatory, right ?

@danielqsj
Copy link
Owner

Consumer groups: Below we define the embedded protocol used by consumer groups. We recommend all consumer implementations follow this format so that tooling will work correctly across all clients.

ProtocolType => "consumer" --

@sciclon2
Copy link
Contributor Author

sciclon2 commented Mar 16, 2024

Sure! Here's an improved version of your message:

Hey @danielqsj, you're absolutely right. I realized that I forgot to close this PR as I identified the root cause of the issue. Essentially, the problem lies in the handling of errors when we encounter them in assignment, err := member.GetMemberAssignment(). Currently when it returns an error we are prematurely returning instead of continuing the loop , which leads to potential issues such as skipping metrics randomly.

It's worth noting that errors can indeed arise in this context, as seen with the Confluent Schema Registry scenario I mentioned earlier in this PR summary. To address this, we should replace "return" with "continue" here:

if err != nil {
    klog.Warningf("Cannot get GetMemberAssignment of group member %v: %v", member, err)
    return // It should continue here instead
}

Making this adjustment will ensure that the process continues seamlessly even in the presence of errors, preventing any unexpected disruptions to the flow of metrics and associated functionalities.

We can see the error of sr-1-* members
Screenshot 2024-03-16 at 07 48 51

Addressing comments
@sciclon2 sciclon2 changed the title Skip unknown protocolType in consumer groups Fix Error Handling in Member Assignment to Ensure Seamless Processing Mar 16, 2024
@danielqsj
Copy link
Owner

LGTM, thanks @sciclon2

@danielqsj danielqsj merged commit b41c0c4 into danielqsj:master Aug 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants